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Abstract 

In this paper, we present and analyze a simple and robust spectral algorithm for the 
stochastic block model with k blocks, for any k fixed. Our algorithm works with graphs 
having constant edge density, under an optimal condition on the gap between the density in¬ 
side a block and the density between the blocks. As a co-product, we settle an open question 
posed by Abbe et. al. concerning censor block models. 


1 Introduction 


Community detection is an important problem in statistics, theoretical computer science and im¬ 
age processing. A widely studied theoretical model in this area is the stochastic block model. In 
the simplest case, there are two blocks V\. V 2 each of size of n; one considers a random graph 
generated from the following distribution: an edge between vertices belonging to the same block 
appears with probability and an edge between vertices across different blocks appear with prob¬ 
ability where a > b > 0. Given an instance of this graph, we would like to identify the two 
blocks as correctly as possible. Our paper will deal with the general case of k > 2 blocks, but for 
the sake of simplicity, let us first focus on k = 2. 

For k = 2, the problem can be seen as a variant of the well known hidden bipartition problem, 
which has been studied by many researchers in theoretical computer science, starting with the 
work of BBCLS871 : (see HDF89II |Bop87| IJS931 I McSO l l and the references therein for further 
developments). In these earlier papers, a and b are large (at least log n) and the goal is to recover 
both blocks completely. It is known that one can efficiently obtain a complete recovery if > 


and a, b > Clog n for some sufficiently large constant C (see, for instance IIVu 14:1 ). 

In the stochastic block model problem, the graph is sparse with a and b being constants. Clas¬ 
sical results from random graph theory tell us that in this range the graph contains, with high 
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probability, a linear portion of isolated vertices HBolOll . Apparently, there is no way to tell these 
vertices apart and so a complete recovery is out of question. The goal here is to recover a large 
portion of each block, namely finding a partition V{ U V.[ of V = V) U V 2 such that V, and V' are 
close to each other. For quantitative purposes, let us introduce a definition 

Definition 1.1 A collection of subsets V[. Vj of Vj U V 2 is 7 -correct if\Vi n V( [ > (1 — 7 )n. 

In HCQ10II . Coja-Oglan proved 


Theorem 1.2 For any constant 7 > 0 there are constants do,C > 0 such that if a, b > do and 
> C log(a + b ), one can find a ^-correct partition using a polynomial time algorithm. 


Coja-Oglan proved Theorem 1.2 as part of a more general problem, and his algorithm was 
rather involved. Furthermore, the result is not yet shaip and it has been conjectured that the log 


term is removable! 1 1 Even when the log term is removed, an important question is to find out the 


optimal relation between the accuracy 7 and the ratio 


(°- fr ) 2 

a+6 


. This is the main goal of this paper. 


Theorem 1.3 There are constants Cq and C\ such that the following holds. For any constants 
a > b > C\) and 7 > 0 satisfying 


(a - b? 

a + b 


>C! log 1 , 

7 


we can find a 7 -correct partition with probability 1 — o(l) using a simple spectral algorithm. 


The constants Co, Cl can be computed explicitly via a careful, but rather tedious, book keep¬ 
ing. We try not to optimize these constants to simplify the presentation. The proof of Theorem |1 .3 1 
yields the following corollary 


Corollary 1.4 There are constants Co and e such that the following holds. For any constants 
a > b > Cq and e > 7 > 0 satisfying 


(q - fr ) 2 

a + b 


> 8.1 log 

7 


we can find a 7- correct partition with probability 1 — o(l) using a simple spectral algorithm. 


In parallel to our study, 
constant c > 0 


, proving a minimax rate result that suggested that there is a 


(a — b) 2 
a + b 


< c log — 
7 


then one cannot recover a 7 -correct partition (in expectation), regardless the algorithm. 

In order to prove Theorem |1.3[ we design a fast and robust algorithm which obtains a 7 -correct 
partition under the condition > C log ^. Our algorithm guarantees 7 -correctness with high 

probability. 

We can refine the algorithm to handle the (more difficult) general case of having k blocks, 
for any fixed number k. Suppose now there are k blocks Vj,..., V/,. with Vj = j with edge 
probabilities - between vertices within the same block and ^ between vertices in different blocks. 
As before, a collection of subsets V(, Vfi ... I 7 of Vj IJ Xfi U ... IJ \fi is 7 -correct if \V l (7 V- > 

(l-7)f- 


*We would like to thank E. Abbe for communicating this conjecture. 
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Theorem 1.5 There exists constants C \, Co, such that ifk is any constant as n —> oo and if 


1. a > b > C\ 

2. (a — b) 2 > C 2 k 2 a\og 

then we can find a y-correct partition with probability at least 1 — o(l) using a simple spectral 
algorithm. 


We believe that this result is sharp, up to the values of C\ and 6/: in particular, the requirement 
= Tl(k 2 ) is optimal. 

Our method also works (without significant changes) in the case the blocks are not equal, but 
have comparable sizes (say cn > \Vj\ > n for some constant c > 1). In this case, the constants 
C\,C 2 above will also depend on c. While the emphasis of this paper is on the case a,b are 
constants, we would like to point out that this assumption is not required in our theorems, so our 
algorithms work on denser graphs as well. 

Let us now discuss some recent works, which we just learned after posting the first version 
of this paper on arxiv. Mossel informed us about a recent result in I . VINS 13b 1 which is similar 
to Theorem |1.3| (see I MNS13b Theorem 5.3]). They give a polynomial time algorithm and prove 
that there exists a constant C such that if ( a — b) 2 > C(a + b) and a, b are fixed as n —t oo, then the 
algorithm recovers an optimal fraction of the vertices. The algorithm in I MNS13b 1 is very different 
from ours, and uses non-back tracking walks. This algorithm doest not yet handle the case of more 
than 2 blocks, and its analysis looks very delicate. Next, Guedon sent us BGV141 . in which the 
authors also proved a result similar to Theorem 1.3 under a stronger assumption (a — b) 2 > 


C\(a + b) (see Theorem 1.1 and Corollary 1.2 of IIGV141 ). Their approach relies on an entirely 


different (semi-definite program) algorithm, which, in turn, was based on Grothendick’s inequality. 
This approach seems to extend to the general k > 2 case; however, the formulation of the result 
in this case, using matrix approximation, is somewhat different from ours (see liGV 141 Theorem 
1.3]). Two more closely related papers have been brought to our attention by the reviewers. In 
I1LMX131 , authors have worked out the spectral part of the result in this paper. 

It is remarkable to see so many progresses, using different approaches, on the same problem 
in such a short span of time. This suggests that the problem is indeed important and rich, and it 
will be really pedagogical to study the performance of the existing algorithms in practice. We are 
going to discuss the performance of our algorithm in Sections 3 and 4. 

We next present an application of our method to the Censor Block Model studied by Abbe 
et. al. in [ ABBS 14 1. As before, let V be the union of two blocks Vj, If , each of size n. Let 
G = (V, E) be a random graph with edge probability p with incidence matrix If ; and x = 
(xi, ...,X 2 n ) be the indicator vector of Vj. Let z be a random noise vector whose coordinates z ei 
are i.i.d Bernoulli(e) (taking value 1 with probability e and 0 otherwise), where e t are the edges of 
G. 

Given a noisy observation 

y = Bqx ® z 

where © is the addition in mod 2, one would like identify the blocks. In ltABBS14l . the authors 
proved that exact recovery (7 = 0 ) is possible if and only if + 0 (^_A_^) j n the 

limit e —> 1/2. Further, they gave a semidefinite programming based algorithm which succeeds 
up to twice the threshold. They posed the question of partial recovery (7 > 0) for sparse graphs. 
Addressing this question, we show 


Theorem 1.6 For any given constants 7,1/2 > e > 0, there exists constant C\, 6 / such that if 
np > (j^e ) 2 an d P — 77 , then we can find a 7 -correct partition with probability 1 — o(l), using 
a simple spectral algorithm. 
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Let us conclude this section by mentioning a related, interesting, problem, where the purpose 
is just to do better than a random guess (in our terminology, to find a partition which is (1/2 + e)- 
correct). It was conjectured in BDKMZ1H that this is possible if and only if (a — b) 2 > (a + b). 
This conjecture has been settled recently by Mossel et. al. tMNS12| ||M NS 13all and Massoulie 
liMasl3 1 . Another closely related problem which has been studied in llABH14l liMNS 14l is about 
when one can recover at least 1 — o(l) fraction of the vertices. 


The rest of the paper is organized as follows. In section [2] we describe our algorithm for 
Theorem 2.1 and an overview of the proof. The full proof comes in sections [3] In section |4] we 
show how to modify the algorithm to handle the k block case and prove theorem |1.5| Finally, in 
section [5} we prove theorem [L6| 

2 Two communitites 


We first consider the case k = 2. Our algorithm will have two steps. First we use a spectral 
algorithm to recover a partition where the dependence between 7 and is sub-optimal. 

Let An de note the adjacency matrix of a random graph generated from the distribution as in 
Theorem |2.l| Let Ao = EAq and Eq = Aq — Aq. Then 4 q is a rank two matrix with the two non 
zero eigenvalues Ai = a + b and A 2 = a — b. The eigenvector u\ corresponding to the eigenvalue 
a + b has coordinates 

ui(i) = _ , for all i G F 

V2 n 

and eigenvector 112 corresponding to the eigenvalue a — b has coordinates 


U2(i) 


-h 

A iU€V2 - 


Spectral Partition. 

1 . Input the adjacency matrix Aq, d := a + b. 

2. Zero out all the rows and columns of Ao corresponding to vertices whose degree is bigger 
than 20 d, to obtain the matrix A. 

3. Find the eigenspace W corresponding to the top two eigenvalues of A. 

4. Compute v\, the projection of all-ones vector on to W 

5. Let V 2 be the unit vector in W perpendicular to v 1 . 

6 . Sort the vertices according to their values in V 2 , and let V[ C V be the top n vertices, and 
V -2 C V be the remaining n vertices 

7. Output {V[, V/). 

Figure 1: Spectral Partition 

Notice that the second eigenvector of Ao identifies the partition. We would like to use the 
second eigenvector of Ao to approximately identify the partition. Since Aq = Aq + Eq, perturba¬ 
tion theory tells us that we get a good approximation if ||£n|| is sufficiently small. Flowever, with 
probability 1 — o(l), the norm of Eq is rather large (even larger than the norm of the main term). 
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In order to handle this problem, we modify Eq using the auxiliary deletion, at the cost of losing a 
few large degree vertices. 

Let A,A,E be the matrices obtained from Aq,Aq,E$ after the deletion, respectively. Let 
A = f A — Aq; we have 


A = A + E 

= A 0 + A + E. 

The key observation is that ||Ej| is significantly smaller than 11 JTq 11. In the next section we 
will show that ||£’|| = with probability 1 — o(l), while ||ij)|| is ^pigkjgn)’ 

probability 1 — o(l). Furthermore, we could show that ||A|| is only 0(1) with probability 1 —o(l). 
Therefore, if the second eigenvalue gap for the matrix A^ is greater than Csfd, for some large 
enough constant C. then Davis-Kahan sin 0 theorem would allow us to bound the angle between 
the second eigenvector of A o and A by an arbitrarily small constant. This will, in turn, enable us 
to recover a large portion of the blocks, proving the following statement 

Theorem 2.1 There are constants Cq and Cj such that the following holds. For any constants 
a > b > Cq and 7 > 0 satisfying > C\ then with probability 1 — o(l), Spectral 

Partition outputs a y-correct partition. 

Remark 2.2 The parameter d := a + b can be estimated very efficiently from the adjacency matrix 
A. We take this as input for a simpler exposition. 


Partition 

1. Input the adjacency matrix A( h d := a + b. 

2. Randomly color the edges with Red and Blue with equal probability. 

3. Run Spectral Partition on Red graph, outputting V[. Vj. 

4. Run Correction on the Blue graph. 

5. Output the corrected sets V[, Vj. 

Figure 2: Partition 

Step 2 is a further correction that gives us the optimal (logarithmic) dependence between 7 and 
. The idea here is to use the degree sequence to correct the mislabeled vertices. Consider a 
mislabeled vertex u £ V{ H Vj. As u € Vj, we expect u to have b neighbors in Vj and a neighbors 
in V 2 . Assume that Spectral Partition output V{, Vj where Vj \ Vj' < O.ln, we expect u to have 
at most 0.96 + 0.1a neighbors in Vj' and at least 0.16 + 0.9a neighbors in Vj. As 

0.16 + 0.9a > > 0.96 + 0.1a, we can correctly reclassify u by thresholding. There are, 

however, few problems with this argument. First, everything is in expectation. This turns out to be 
a minor problem; we can use a large deviation result to show that a majority of mislabeled vertices 
can be detected this way. As a matter of fact, the desired logarithmic dependence is achieved at 
this step, thanks to the exponential probability bound in the large deviation result. 

The more serious problem is the lack of independence. Once Spectral Partition has run, the 
neighbors of u arc no longer random. We can avoid this problem using a splitting trick as given in 
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Partition. We sample randomly half of the edges of the input graph and used the graph formed by 
them in Spectral Partition. After receiving the first partition, we use the other (random) half of 
the edges for correction. This doesn’t make the two steps completely independent, but we can still 
prove the stated result. 

The sub-routine Correction is as follows: 


Correction. 

1. Input: a partition V [. V.[ and a Blue graph on V[ U V-[. 

2. For any u G V{, label u bad if the number of neighbors of u in V.j is at least and good 
otherwise. 

3. Do the same for any v G ¥,[. 

4. Correct V' be deleting its bad vertices and adding the bad vertices from 

Figure 3: Correction 

Figure [4]is the density plot of the matrix before and after clustering according to the algorithm 
described above. We can prove 

Lemma 2.3 Given a 0.1-correct partition V[. and a Blue graph on V[ U as input to the sub¬ 
routine Correction given infigurepl we get a 7 - correct partition with 7 = 2 exp (—0.072 A ). 


Figure 4: On the left is the density plot of the input (unclustered) matrix with parameters n = 
7500, a = 10, b = 3 and on the right is the density plot of the permuted matrix after running the 
algorithm described above. This took less than 3secs in Matlab running on a 2009 MacPro. 


3 First step: Proof of Theorem |2.1| 

We now turn to the details of the proof. Using the notation in the previous section, we let W 
be the two dimensional eigenspace corresponding to the top two eigenvalues of A and W be the 
corresponding space of A. For any two vector subspaces W \, W 2 of same dimension, we use the 
usual convention sin Z(IU|, W 2 ) := \\Pwi ~ Pw 2 II’ where P\y t is the orthogonal projection onto 
W{. The proof has two main steps: 

1. Bounding the angle : We show that sin Z(W, W) is small, under the conditions of the theo¬ 
rem. 




nz= 194030 
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2. Recovering the partition : If sin /(If 7 . W) is small, we find an approximate partition which 
can then improved to find an optimal one. 

3.1 Bounding the angle 

For the first part, recall that A = Aq + A + E. We first prove that || A|| and \\E\\ are small with 
probability 1 — o(l). Bounding || A|| is easy as it will be sufficient to bound the number of vertices 
of high degrees. We need the following 

Lemma 3.1 There exist a constant do such that if d := a + b > do, then with probability 1 — 
exp (—fl(a“ 2 n)) not more than a~ 3 n vertices have degree > 20 d. 

Note that the proof of the above lemma and other missing proofs in this subsection appear in 
appendix |A.l | If there are at most a :! n vertices with degree > 20 d, then by definition, A has at 
most 2a~ 3 n 2 non-zero entries, and the magnitude of each entry is bounded by ^ Therefore, its 
Hilbert-Schmidt norm is bounded by | A|| //s , < \[2a~ l,/2 . 

Corollary 3.2 For do sufficiently large, with probability 1 — exp(—f l(a~ 3 n)), ||A|| < 1. 

Now we address the harder task of bounding ||L’||. Here is the key lemma 


Lemma 3.3 Suppose M is random symmetric matrix with zero on the diagonal whose entries 
above the diagonal are independent with the following distribution 


M tj _(>-« wp - , W . 

I ~Pij W.p. 1 - Pij 

Let a be a quantity such that p^j < a 2 and Mi be the matrix obtained from M by zeroing out 
all the rows and columns having more than 20a 2 n positive entries. Then with probability 1 — o(l), 
||Mi|| < Ca^/n for some constant C > 0. 


Lemma |3.3| implies 


Corollary 3.4 There exist constants Co, C such that if a > b > Co, and E is obtained as de¬ 
scribed before, then we have, 

IICII < CVd 


with probability 1 — o(l). 


Now, let v i, V '2 be eigenvectors of Aq corresponding to the largest two eigenvalues Ai > A 2 
V\, v -2 be eigenvectors of A = Aq + A + E corresponding to the largest two eigenvalues. Further, 
W := Span{fi"i, V 2 ] and W := Spanjrri, V 2 }. 


Lemma 3.5 For any constant c < 1, we can choose constants C 2 and C 3 such that such that if 
a — b > C 2 V 0 + b = C 2 sfd and a > C 3 then, sin(Z!C, W) < c < 1 with probability 1 — o(l). 


Proof of Lemma |L5| Let C 3 be a constant such that if a > C 3 , then theorem 3.2 holds giving 
us IIAll < 1. From lemma 3.4 we have that \\E\\ < C\Jd. The lemma then follows from the 


Davis-Kahan IIDav63ll HBha97H bound for matrices Aq and A, which gives sin (ZW, W) < . 

Therefore, the lemma follows by choosing Ci big enough. 
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3.2 Recovery 

Given a subspace W satisfying sin(ZPC, W) < c < 1/16, we can recover a big portion of the 
vertices. We prove (in appendix |A.2[ ) that 

Lemma 3.6 Given a subspace W satisfying sin (ZW, W) < c < 1/16, we can recover a 8c/3- 
correct partition. 

Once we have an approximate partition, we can use the Blue edges to boost it in the Correction 
step. We prove (in appendix |A.3[ ) 

Lemma 3.7 Given a 0.1 correct partition V[. V/ as input to the Correction routine in figure [d] 
the algorithm outputs a 7 correction partition with 7 = 2 exp(—0.072 ^ ^ ). 

4 Multiple communities 
4.1 Overview 

Let us start with the algorithm, which (compared to the algorithm for the case of 2 blocks) has an 
additional step of random splitting. This additional step is needed in order to recover the partitions. 
We will start by computing an approximation of the space spanned by the first k eigenvectors of 
the hidden matrix. However, when k > 2, it is not obvious how to approximate the eigenvectors 
themselves. To handle this problem, we need a new argument that requires this extra step. 


Partition 

1. Input the adjacency matrix Aq. a, b. 

2. Randomly color the edges with Red and Blue with equal probability. 

3. Randomly partition V into two subsets Y and Z. Let B be the adjacency matrix of the 
bipartite graph between Y and Z consisting only of the Red edges, with rows indexed by Z 
and the columns indexed by Y. 

4. Run Spectral Partition (figure [6]) on matrix If and get U\. Uf .... U/ as output. This part 
uses only the Red edges that go between vertices in Y and Z and outputs an approximation 
to the clustering in Z = U\ U ... U (//,.. Here, U t := Vi D Z. 

5. Run Correction (figure [7} on the Red graph. This procedure only uses the Red edges that are 
internal to Z and improves the clustering in Z. 

6 . Run Merging (figure[8]l on the Blue graph. This part uses only the Blue edges that go between 
vertices in Y and Z and assigns the vertices in Y to appropriate cluster. 


Figure 5: Partition 

Since we use different set of edges for each step, we have independence across the steps. 

4.2 Details 

Step 1 is a spectral algorithm on a portion of the adjacency matrix Aq as given in figure [6] This will 
enable us to recover a large portion of the blocks Z n Vi,..., Z n V/. We will prove the following 
statement (appendix |BT| 


Theorem 4.1 There exists constants C\, Co such that for any fixed integer k the following holds. 










Spectral Partition. 

1. Input B (a matrix of dimension \ Z\ X |Yj ), a, 6 and k. 

2. Let Y\ be a random subset of Y by selecting each element with probability | independently 
and let A]. A 2 be the sub matrix of B formed by the columns indexed by Y\, Y 2 := Y"\Y \, 
respectively. 

3. Let d := a + (k — 1)6. Zero out all the rows and columns of A\ corresponding to vertices 
whose degree is bigger than 20 d, to obtain the matrix A. 

4. Find the space spanned by k left singular vectors of A, say W 

5. Let 01 ,..., a m be some m = 2 log n random columns of A 2 . For each i, project a* — a onto 
W , where a(j) = T/T for all j is a constant vector. 

6 . For each projected vector, identify the top (in value) n/2k coordinates. Of the 2 log n sets so 
obtained, discard half of the sets with the lowest Blue edge density in them. 

7. Of the remaining subsets, identify some k subsets U [...., U' k such that | U- fl f/'| < 0.2n/2k, 
for i 7 ^ j. 

8 . Output U [,..., U' k . 

Figure 6 : Spectral Partition 


1. a > b > C\ 

2. -— > C 2 k 2 -, and 

then we can find a 7- correct partition U \,..., U' k of Z with high probability using a simple spectral 
algorithm. 

Step 2 (figure [7]) is a further correction that gives us the optimal (logarithmic) dependence 
between 7 and ■ The idea here is to use the degree sequence to correct the mislabeled 

vertices in Z. Consider a mislabeled vertex u E Z n Vi. As u E Z n V), we expect u to have 
a/4 Red neighbors in Z n V) and 6/4 Red neighbors in Z n Vi for all i 1. Assume that 
Spectral Partition output U[,...,U' k where (/]\(/[| < An/2k, we expect u to have at most 
0.96/4& + O.la/4/c Red neighbors in U[ and at least 0.16/4A: + 0.9a/4A; Red neighbors in U[. As 

0.16/8/c + 0.9a/8fc > > 0.96/8A: + 0.1a/8ife 

8k 

we can correctly reclassify u by thresholding. We can prove (appendix |B.2| ) 

Lemma 4.2 Given a 0.1 correction partition of Z = (Z 0 Vj) U ... U (Z 0 14) and the Red 
graph over Z, the sub-routine Correction given in figure [7] computes a 7 correct partition with 
7 = 2A:exp(-0.04g^g). 

Step 3 is to use the clustering information of vertices in Z to label the vertices in Y, and is 
similar to step 2. We prove (appendix |B.3|) 
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Correction. 

1. Input: A collection of subsets U [,..., U' k C Z and a graph on Z. 

2. For every u E Z,if i E {1.2..... k ) is such that u has maximum neighbors in U[, then add u 
to U”. Break ties arbitrarily. 

3. Output U”,..., U' k . 

Figure 7: Correction 


Lemma 4.3 Given a 0.1 correction partition of Z 
graph over Y U Z, the sub-routine Merge is given in 
7 = 2texp(-0.0324igg). 


= (Z (1 V ]) U ... U (Z n 14) and the Blue 
(figure^ computes a 7 correct partition with 


Merging. 

1. Input: A partition U [,..., U' k of (Z n V \) U (Z n V 2 ) U ... U (Z n 14) and a graph between 
vertices Y and Z. 

2. For all u EF, label u with ‘i’ if the number of neighbors of u in U- is at least Label the 
conflicts arbitrarily. 

3. Output the label classes as the clusters V[,...,V k . 



Figure 8 : Merge 

Combining lemmas 4.2| 4.3 we get the stated result. 


1 65828 



Figure 9: On the left is the density plot of the input (unclustered) matrix with parameters n = 
3000, a = 22, 6 = 2 and on the right is the density plot of the permuted matrix after running the 
algorithm described above. This took less than lsec in Matlab running on a 2009 MacPro. 


5 Censor Block Model 

We first introduce some notations so as to write this problem in a way similar to the other problems 
in this paper. To simplify the analysis, we make the following assumptions. We assume that there 
are \V\ = 2 n vertices, with exactly n of them labeled 1, and the rest labeled 0. As in jABBS14il . 
we assume that G E Gon.p is a graph generated from the Erdos-Renyi model with edge probability 
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p. Since any edge (i. j) appears with probability p, and that £ e ~ Bernoulli(e), we have 




nr* . Z]A nr* . 

1 VL7 Juj 

ry* . AT) nr* . ZT\ 1 

Lix j vlx J- 

0 


w.p. p{ 1 — e) 
w.p. pe 
w.p. 1 — p 


For any i , j £ V, let us write w, j := X{ © Xj, and W := ( Wij)ij the associated 2n x 2 n matrix. 


Spectral Partition II. 

1. Input the adjacency matrix Y, p. 

2. Zero out all the rows and columns of Y corresponding to vertices whose degree is bigger than 
20pn, to obtain the matrix 1 q. 

3. Find the eigenspace U corresponding to the top two eigenvalues of Yq. 

4. Compute v\, the projection of all-ones vector on to U 

5. Let V 2 be the unit vector in W perpendicular to v i. 

6 . Sort the vertices according to their values in V 2 , and let V[ C V be the top n vertices, and 
V 2 C V be the remaining n vertices 

7. Output (V{, Cj). 

Figure 10: Algorithm 3 


We note that := E(y t j) = pe + p( 1 — 2 e)wij. Therefore, we can write y t j = + Qj, 

where Qjs are mean zero random variables satisfying Var(©j) < p. First we note that we can 
recover the two communities from the eigenvectors of the 2 n x 2 n matrix Y := {yiy) = pel + 
p{ 1 — 2e)W. Y is a rank 2 matrix with eigenvalues pn and p( 1 — 2e)n, with the corresponding 
eigenvectors V\ = (1,1,1) and V 2 = (1,1, — I,—1). if we can find V 2 , we can identify 


the two blocks. Let Y = {iHy) and E = (Ci,j) be 2 n x 2 n matrices. Algorithm 10 (which 
is essentially same as algorithm [2]) which takes as input the adjacency matrix Y and the edge 

r . More detail appears in appendix C 


probability p achieves this when np > ,. f 2 
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A Two communities 
A.l Bounding || A|| and ||i£|| 

Proof of Lemma |X7} One can prove Lemma [TT| using a standard argument from random graph 
theory. Consider a set of vertices X C V of size X = cn, where c < 1 is a constant. We first 
bound the probability that all the vertices in this set have degree greater than 20 d. 

Let us denote the set of edges on X by E{X) and the set of edges with exactly one end point in 
X by E(X, X'"). If each degree in X is at least 20 d, then a quick consideration reveals that either 
|i7(X)| > 2cnd or \E(X, X c )\ > 8 end. The expected number of edges Pe{x) '■= E(\E(X)\) 
satisfies 

0.25(cn) 2 — < fi E(x) < 0.5(cn) 2 —. 
n v ’ n 

Let := 2 < , then Chernoff bound (see [AS04 1 for example) gives 


P(|L;(X)| > end ) < 


p(ii-i)V m 

sf J 


< exp ( (-1-log ( - ) ) 0.25(cn) 2 — 


n 


1 


< exp — log - 0.25(cn)"— (for small enough c 


n 

= exp ^—0.25log aen'j . 

Similarly, the expected number of edges p e(x,x c ) i n E ( X , X c ) satisfies 

, 2 ® ^ ^ 


Let ^2 := 4 < 


8 end 
VE{X,X C ) 


c(l - c)n 2 - < p E { x,x°) < c ( 2 ~ c)n 2 — . 

n K ’ n 

, then by Chernoff bound 


P(|^(X,X C )| > 8 end) < 
< 


/'exp(fe-l)Y*‘ W 

l 4* J 

exp (—c (2 — c)an ). 
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Now, if we substitute c = a 3 in the above bounds, we get 


P(|£'(X)| > 2 end) < exp (—0.75log(a)a 2 n) 
P(|.E(X, X c )\ > 8 end) < exp (— a~ 2 n) . 

There are at most 

subsets X of size |X| = cn. Substituting c = a -3 again, we get 



< exp (4a 3 log(a)n) . 


The claim follows from the union bound. 

Proof of Lemma We start by proving a simpler result. 


Lemma A.l Let M be random symmetric matrix of size n with zero diagonal whose entries above 
the diagonal are independent with the following distribution 


M = / 1 - re re . 

1 -Pij W.p. 1 - Pij 

Let o 2 > C\ be a quantity such that p^ < o 2 for all i,j, where C\ is a constant. Then 
with probability 1 — o(l), \\M\\ < for some constant C 2 > 0. 


Let us address Lemma A.l A weaker bound Cay/n log n follows easily from Alshwede- 


Winter type matrix concentration results (see llTrol21D . To prove the claimed bound, we need to 
be more careful and follow the e-net approach by Kahn and Szemeredi for random regular graphs 
in BFKS89I (see also HAK94llFQ05in . 


Consider a f-net J\f of the unit sphere S". We can assume \J\f\ < 5”. It suffices to prove 
that there exists a constant C' 2 such that with probability 1 — o(I), \x T My\ < C^ofTi for all 
x,y e Af. 

For two vectors x , y G A f, we follow an argument of Kahn and Szemeredi IFKS89B and call 
all pairs (i,j) such that xpjj j < T light and all remaining pairs heavy and denote these two 
classes by L and H respectively. We have 


X T My = ^ X;.\ IjjHj = ^2 x i M i,jVj + X i M i,jVj- 
i,j L H 

We now show that with probability I — o(l), the last two summands are small in absolute value. 
First, let us consider the contribution of light couples. We rewrite X := x l M l ^y : j as 

£( i,j)eL,i>j where 


a i,j — 


XiVj + XjPi 
XiVj 
XjUi 


if (Fj),CM)eL 

if (i,j) G L 
if ( j , i) G L 


By the definition light pairs, |a*j| < 2^=. Also, since x and y are unit vectors, Yli j a ij — 4. 
Therefore, by Bernstein’s bound (see page 36 in IIBLM131 for e.g.) 
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F(X > t) < exp ( —-—=r-- ] . 

W+Px) 

Set t = 1 0<7 \/n and use the union bound (combining with the fact that the net has at most 5 n 
vectors, we can conclude that with probability at least 1 — exp(— 3n), | x i^i,jVj I ^ 1 0 <r. 
Next we handle the heavy pairs in H. Since 1 > jT rr x bi]- the definition of heavy implies 

that J2 H \ x iyj\ < TF- 

Let A t j := + pij, then 

y, XiM l . j y j = ^2XiAijUj -^2PijXiVj■ 

H H H 

Note that A defines a graph, say Ga, such that A is its adjacency matrix. As p l} < a 2 , we have 
Pij\ x iVj\ < <j2 ^2 = 0 \fn. We use the following lemma to bound the first term. 

Lemma A.2 Let G = (V,E) be any graph whose adjacency matrix is denoted by A, and x, y 
be any two unit vectors. Let d be such that the maximum degree < c±d. Further, let d satisfy 
the property that for any two subsets of vertices S. T C V one of the following holds for some 
constants c 2 and C 3 


e(S,T) 

-T < C 2 

\S\\T\$ 


(LI) 


e(S, T) log () < c 3 |T| log (1.2) 

\\S\\T\iJ \ T \ 

then x i^i,jVj < max(16, 8 ci, 32c2,32c 3 )\/d Here H := {(i, j)\\xj.yj\ > V~d/n }. 

The proof appears in appendix [P] 

Lemma A.3 Let d := a 2 n. Then with probability 1 — o(l), the maximum degree in the graph Ga 
is < 20d and for any S,T C V one of the conditions (??) or (?? ) holds. 


The two lemmas above guarantee that with probability 1 — o(l), | x l A 1 ^yf < C' a yjn 
for some constant C'. 

Proof The bound on the maximum degree follows from the Chernoff bound. We have that 


A.. -I 1 W -P' Pv 

lj \ 0 w.p. 1 - Pij 

Consider a particular vertex k and let X = A r y. be the random variable denoting the number 
of edges incident on it. We have that 

H = EX = Y2Pik < 

i 

For any l > 4, Chernoff bound (see BAS041 ) implies that 
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P(X > la 2 n) < exp ( — 


< exp — 


a 2 nl In l 
3 

l logn 


Applying this with l = 20, and taking a union bound over all the vertices, we can bound the 
maximum degree by 20cr 2 n. Now let S', T C V be any two subsets. Let X := e(S. T) be the 
number of edges going between S and T. We have EX < (j 2 |S’||T|. If \T\ > ”, then since the 
maximum degree is < 20ir 2 n, we have e(S,T ) < |Sj20cr 2 n < 20e<r 2 1 Sj | Tj, giving us ?? in this 
case. Therefore, we can assume |T| < By Chernoff bound, it follows that for any l > 4, 


(e(S,T) > la 2 \S\\T\) < exp - 


nn(i)a 2 |5||r| \ 


Let l' be the smallest number such that l' ln(7') > log () • As in I1FQ051 . if we choose 


m 


l = max(/',4), we can bound the above probability by exp (|g|) (|J|) < 

Therefore, by the union bound we get that with probability 1 — o(l) for all subsets S, T, and 

e(S,T) < max(/',4)(7 2 |5||T|. 

This implies that one of the conditions ?? or ?? holds with probability 1 — o(l). 


Proof of Lemma |!P} Now we are ready to prove Lemma [373] by modifying the previous proof. 
We again handle the light couples and the heavy couples separately, but need to make a modifica¬ 
tion to the argument for the light couples. 

Since we zero out some rows and columns of M to obtain M \, we first bound the norm of the 
matrix Mo, obtained from M by zeroing out a set S of rows and the corresponding columns. Next, 
we take a union bound over all choices of S. For a fixed S, lemma[AT]implies that with probability 
at least 1 — exp(—3n), for all nje A/ 1 / 2 , I < l0a^/n. Since there are at most 

2 n = exp (n In 2) choices for S, we can apply a union bound to show that with probability at least 
1 - exp(—(3 - In 2)n), | < 10 o^fn. 

The proof for the heavy couples goes through without any modifications. We just have to 


verify that the conditions of lemma A.2 are met. Firstly, the adjacency matrix A\ obtained from 
Mi has bounded degree property by the definition of Mi. Now we note that only for the case of 
I'S'| < |Tj > - did we need that the maximum degree was bounded. So for any 151 < |Tj < -, 
the discrepancy properties (??) or (??) holds for A\, since zeroing out rows and columns can only 
decrease the edge count across sets of vertices. In the case |Tj > ”, like before we can show that 
(??) holds for A\ since the degrees are bounded. ■ 


Now, to bound the norm of matrix E, we just appeal to 3.3 Suppose a > b > Co, for a large 
enough constant Co to be determined later. Since A = + A + E and we have bounded A, it 

remains to bound II.Ell. Note that 


(Eo)ij — 


1 - 


w.p. 

w.p. 


a 

n 

1 _ « 
n 
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if i,j belongs to the same community and 


(Eo)ij 



n 


w.p. £ 

W-P- 1 - b n 


if i, j belongs to different communities. Since a > b, for all i,j we have that 

a / a\ d 

Var((£ 0 )ij) < ~ U-) < -■ 

n V nJ n 


A.2 Recovery 

Now we focus on the second step in the proof, namely the recovery of the blocks once the angle 
condition is satisfied. 

Lemma A.4 Ifsin(ZW, W) < c < j, then we can find a vector v £ W such that sin (Zv, v 2 ) < 
2\/c. 

Proof Let . Pw be the orthogonal projection operators on to the subspaces W. W respec¬ 
tively. From the angle bound for the subspaces, we have that 

\\ p w ~ p w \\ 2 < c - 

The vector we want is obtained as follows. We first project v\ on to W, and then find the unit 
vector orthogonal to the projection in W. We will now prove that the vector so obtained satisfies 
the bound stated in the lemma. Since v\,v 2 E W, we have that | P\,y n— Vi\\ 2 < c for i = 1, 2. 
Let us define Ui := Pw^i and x, := Ui — Vi (note that 11x,, 11 < c) for i = 1, 2. We will now show 

T 

that the vector v £ W perpendicular to u\ is close to v 2 . Let u± = u 2 — U Z U Z 2 u\, it is then clear 

|| ^1 || 

that ||ttj_|| < 1. Note that | uf u 2 \ = |h^xo + F^x 1 + x|X 2 I < 2c + c 2 . We have, 


T - T- 

u 1 _v 2 = u 2 V 2 


(uf U 2 )(v I Ui 

ll M lll 2 


, T I ( 2 c + c 2 )c 

\ufv 2 \ > 1 -c- v y 


(1 - cf 


> 1-2 c. 


The last inequality holds when c <\. Therefore, it holds that for a unit vector v i u\, 

\v T v 2 \ > u\jv 2 1 > 1 — 2 c. 

This gives sin(Zn, v 2 ) < \J 1 — (1 — 2 c ) 2 < 2fc. ■ 

Lemmas[3.5|and|AN|together give 


Corollary A.5 For any constant c < 1, we can choose constants C 2 and C '3 in lemma |T5| and find 
a vector v such that sin(Zv 2 , v) < c < 1 with probability 1 — o(l). 


We now can conclude the proof of our theorem using the following deterministic fact. 

Lemma A.6 If sin (ZD 2 , v) < c < 0.5, then we can identify at least a (1 — |c 2 ) fraction of 
vertices from each block correctly. 
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Proof of Lemma pLdf Let us define two sets of vertices, V[ = {i\v (0 > 0} and V.[ = {i\v (*) < 
0}. One of the sets will have less than or equal to ^ vertices, let us assume without loss of 
generality that \V{\ < j. Writing v = c\ v -2 + err, for a vector err perpendicular to v -2 and 
11 err 11 < c. We also have ci > \/l — c 2 . Since 11 err 11 < c, not more than coordinates of 

err can be bigger than < ^=. Since v = c\V 2 + err at least 1 — > 1 — §c 2 (since 


c < 0.5) fraction of vertices with V 2 (i) = J, n will have v (*) > Therefore, we get that there are 
at least (1 — |c 2 )n vertices belonging to the first block. ■ 


A.3 Proof of lemma 


2.3 


We will use the following large deviation result (see page 36 in HBLM138 for e.g.) repeatedly 


Lemma A.7 (Chernoff) If X is a sum of n iid indicator random variables with mean at most 
p < 1/2, then for any t > 0 


max 


:{P(X > EX + t), F{X <EX- t)} < exp ( - - 


+1 


< exp ( — 


2 np +1 


In the Red graph, the edge densities are a/2n and b/2n, respectively. By Theorem 2.1 there 


is a constant C such that if 


(a-fe ) 2 

a-\-b 


> C then by running Spectral Partition on the Red graph, we 


obtain, with probability 1 — o(l) two sets V[ and Vf where 


\Vi\V'\ < -In. 

In the rest, we condition on this event, and the event that the maximum Red degree of a vertex 
is at most log 2 n, which occurs with probability 1 — o(l). 

Now we use the Blue edges. Consider e = (u, v ). If e is not a red edge, and u E Vi, v G Vj,-u 
then e is a Blue edge with probability 


b/2n 

^ := rrx- 

L 2 n 

Similarly, if e is not a Red edge, and u,v E Vi, then e is a Blue edge with probability 

aj 2n 


Thus, for any u G V( n V r , the number of its Blue neighbors in V/_ ); is at most 


.9 n .In 

*=1 1=1 

where arc iid indicator variables with mean // and (/■' are iid indicator variables with mean r. 
Similarly, for any u G V[ fl Ip, the number of its Blue neighbors in is at least 


.9n—d(u) .In 

S'( u ) V C + E «"• 

2=1 j = 1 

where d(u) = log 2 n is the Red degree of u. 

After the correction sub-routine, a vertex u in the (corrected) set V{ is misclassified if 

• u€V{rV i and S u > 
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• u € V{ n V 2 and S' u < 

Let p \, p 2 be the probability of the above events. Then the number of misclassified vertices in 
the (corrected) set V[ is at most 


n O.ln 

M:=^r fc + ^A ; 

k=\ 1=1 

where T/,. arc iid indicator random variables with mean p\ and A i arc iid indicator random variables 
with mean p^. 

The rest is a simple computation. First we use Chernoff bound to estimate pi,p 2 - Consider 

By definition, we have 


ES(u) = 0.9 np + O.lnr 

„ . , b/2n . ^ . a/ 2 n 

= 0.9n(- t ) + 0.1n( 


■ 1 _ _e_ ■ 

L 2n 


1 _ - 

1 2 n 


(1.3) 


1 


b a b 

= 0.9-+0.1-+ 0.9-1 

2 2 2 1 - 6 / 2 n 


- l) + 0 . 1 -(-^- 1 ). 

’ 2 v 1 — a/2n ’ 


Set 


we have 


a + b 
4 


ES(u), 


t = 0.2 (a -6)-0.9 l (r -L^ 


-l)- 0 . 1 -(-i— 

' 2 v 1 — a/ 2 n 


-1) > 0.2(a— 6 )—0.9- ——0.1- — > 0.19(a—6), 
2 n 2 n 


for any sufficiently large n. 

Applying Chernoff’s bound, we obtain 


< ,_ (0.19(a - b)) 2 _ 

^ >1 ~ ^ 2(0.9n+ + .lnr) + 0.19(a — b) 

By (??), one can show that 2(0.9 np + .lnr) + 0.19(a — b) = 0.716 + 0.29a + o(l) < g ^. 
It follows that 


(a — 6) 2 

pi < exp(—0.072---—). 

r y a+b J 

By a similar argument, we obtain the same estimate for p 2 (the contribution of the term d(u) < 
log 2 n is negligible). Thus, we can conclude that 

EM < l.lnexp(—0.072 ^ ~ ? ). 

v a + 6 ' 

Applying Chernoff’s with t := 0.9nexp(—0.072 ^^ ), we conclude that with probability 

l-o(l) 
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M < EM + t = 2nexp(—0.072 


0 a-b ) S 

a + b 


This implies that with probability 1 — o(l), 


\Vl\Vi\ < 2nexp(—0.072 b ] ). 


a + b 


By symmetry, the same conclusion holds for | V^\|. 
Set 


7 := 2 exp(—0.072 


( Q ~ h f 

a + b 


we have, for i = 1 , 2 


\Vi n y/| = n - \Vi n Y 3 '_J = n - |Y 3 '_A^3-t| > n( 1 - 7 ). 

This shows that the output 17/, V/ form a 7 -correct partition, with 7 satisfying 


(« ~ ^ 

a + b 


—— log- 
0.072 6 7 


13.89 log-, 

7 


proving our claim. 
Proof of Corollary 


(a~b ) 2 

a+b 


Notice that in the analysis of Spectral Partition, we only require 


> C for a sufficiently large constant C (so 7 does not appear in the bound). In the analysis 
of Correction, we require > 13.89 log as shown above. If 7 < e for a sufficiently small 

e, this assumption implies the first. Thus, Corollary holds with assumption > 13.89 log 

The constant 13.89 comes from the fact that the partition obtained from Spectral Partition is 
,1-correct. If one improves upon .1, one improves 13.89. In particular - , there is a constant 5 such 
that if the first partition is ((-correct, then one can improve 13.89 to 8.1 (or any constant larger than 
8 -which is the limit of the method, for that matter). 

B Multiple communities 

We say the splitting is ‘perfect’ if we have |Yj n V)| = ^ = |^2 H Vi\ for % = 1 We 
will assume the splittings are perfect in the proofs for a simpler exposition. Though the splitting 
will almost always not be perfect, and there will just be a o(l) error term that we have to carry 
throughout to be precise. The bounds we give will all be still be essentially the same. 

Proof of theorem 


B.l 


4.1 


To analyze this algorithm, we use the machinery developed so far combined with some ideas from 
UVu 141 . We consider the stochastic block model with k blocks of size n, where is a fixed constant 
as n grows. This is a graph V = V) U V 2 U ... U 14 where each V % \ = n/k and for u 7 V), v G V): 


(u,v) € E] = 


a/n if i = j 
b/n i f j 


We can write, as before 


A = A + E 
= A\ + A + E. 


where A. A\ are the expected matrices, and A is matrix containing the deleted rows and 
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columns. Let W be the span of the k left singular vectors of Ai We can bound ||A|j < 1 by 
bounding the number of high degree vertices as we did before. E is given by 


E'U, V — 

if u, v G Vi n Y\ for some i G 1,.., k and 

PU.V — 


1 - f 


1 -A 

n 


w.p. 

w.p. 


w.p. 

w.p. 


1 - £ 
n 


b 

n 

1 - * 
n 


if u G Vi n Y\ and v 6 V,- fl L for i / j. Since a 2 := ^ > Yar(E UjV ), corollary 
A\ — A gives the following result. 


3.3 


applied to 


Lemma B.l There exists a constant C such that ||£j| < C\Ja + h with probability 1 — o(l). 

It is not hard to show that the rank of the matrix A\ i s k, and its least non-trivial singular value 
is 07 ,,(.4 1 ) = This fact, combined with lemma B.l and an application of Davis-Kahan bound 

gives 


Lemma B.2 For any c > 0, there exists constants C\ , C-i such that if (a — b) > C\ if a and 
a > b> C 2 , then sinZ(l4 / , W) < c with probability 1 — o(l). 


We pick rn = 2 log n indices uniformly randomly from Y 2 and project the corresponding 
columns from the matrix B. Let d (| ,.... d lni and e lx ..... e, rn be the corresponding columns of A± 
and E, respectively. For a subspace Wo, let P\y tj be the projection on to the space Wq. Note that 
if vertex i G V ni fl Y 2 , then 


a-i(j) 


t Xj£V ni nz 

- otherwise 


We let the vector a be 

~a{j) := { ^ if j G Z 
and bi = di — a. We therefore have 


b i(j) 


a—b 
2 n 
b—a 
2 n 


if j £ n z 

otherwise 


Since both a*, a are in the column span of A\, we have for all i 

bi = Pyybi. 

We also note that 11641 = Therefore, if we can recover bi, we can identify the set 

2v 2n 

V rii n Z. We now argue that we can recover bi approximately. Since a, — a = h; + e t , we have 


P\v( a i — a) = Pw b i + P\V e i 

= P\V b i 4“ P\V e i + err i 

= bi + P w ei + erri, 

where err* = (Pyy — p w) b i. Since sin Z(W, W) < 5i, we have for any unit vector v, ||F > vt r ' u — Pw v 
Si, which in turn implies for all i 

\\erri\\ < <5i ||6j|| . 
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Therefore, it is enough to bound ||-fVe*||. We recall that & is a constant that does not depend 
on n. W is k dimensional space giving E ||PyK e *lr < ko . By Markov’s inequality, it follows 
that 

P(||iVei|| > 2 ok 1 ' 2 ) < ^ 

By a simple application of Chernoff bound, we have 

Lemma B.3 With probability at least 1 — o( 1), at least m/2 of the vectors e ,, , e tjn satisfy 

ll-FWe^H < 2ak 1/2 . 

Let mi > m/2 denote the number of such vectors, hence referred to as good vectors. To avoid in¬ 
troducing extra notation, let us say eq ,a are the good vectors and the corresponding indices 

as good indices. Note that a < ^=. For any 82 > 0, there exists a big enough constant C\ such 
that if (a — b) > C\Vka, we have that 2 ok 1 / 2 < 82 \\b t] || whenever ij is good. Therefore 


Lemma B.4 Given any 8 > 0, there exists constants G \, C 2 such that the following holds. If (a — 
b) > Cis/ka and a > b > C 2 , then for all good indices ij, it holds that 11 Pyv ( a %j — a) — b, || < 
5 || bi j ||. 


Let U[. be the top n/2k coordinates of the projected vector P\v( a i — a)||. If we choose the 

3 11 J 11 

constants Cj, C 2 appropriately, then for every good index ij, U t contains 0.95 fraction of the 
vertices in 14, n Z. 


Lemma 


B.5 then implies that when we throw away half of the sets £/(,..., U' rn with the least 


Blue edge densities, then each of the remaining sets intersects some V Ui HZ in 0.9 fraction of the 
vertices. 


Lemma B.5 There exists a constant c > 0 such that the following holds. Suppose we are given a 
set X C Z of size |X| = n/2k. If for all j £ 1,..., k 

\xnVi\ < o.9|x|, 

then with probability at least 1 — e~ cn the number of Blue edges in the graph induced by X is at 
most an/16k — 0.09(a — b)n/16k. Conversely, if 

\XHVi\ >0.95|X| 

for some i £ 1,.... k, then with with probability at least 1 — e~ cn the number of Blue edges in the 
graph induced by X is at least an/16k — 0.09(a — b)n/16k. 


Proof 

Let e(X) denote the number of Blue edges in the graph induced by vertices in X. Suppose 
\X n 14| < 0.9|X| for all i £ 1, ..., k. Then 

Ee(X ) < an/lQk — 0.09(a — b)n/8k. 


To bound the probability that Ee(X) > an/16k — .045(a — b)n/8k, we can use Chernoff bound. 

t . r _ 0.045(a— b)n/8k 

an/16k— 0.09(a— b)n/8k' 


P (Ee{X) > an/l6k — .045(a — b)n/8k) < exp ( — 


(.045(a — 6)n/8fc) s 


2an/16k + 0.045(a - b)n/8k 
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Similarly, suppose \X n Vi\ > 0.95|A'| for some i E 1, k. Then 
Ee(X) > an/l6k — 0.05(a — b)n/16k. 


To bound the probability that Ee(X) < an/16k — .045(a — b)n/8k, we can use Chernoff bound. 

j . r _ 0.04(a— b)n/16k 

^ an/16k— 0.05(a— b)n/16k ’ 


P (Ee(X) < an/l6k — .045(a — b)n/8k) < exp ( — 


(.04(a — b)n/16k) 2 
2an/l6k + 0.04(a — b)n/16k 


B.2 


Proof of lemma 


4.2 


For notional convenience, let Ui := Z (1 Vi. We will use the following large deviation result (see 
page 36 in IBLM131 for e.g.) repeatedly 


Lemma B.6 (Chernoff) If X is a sum of n iid indicator random variables with mean at most 
p < 1/2, then for any t > 0 


nrax{P(X > EX + t),P(X < EX — f)} < exp (— ————-^ < exp 

y \r CLT* y\. ~ I - t J 

By Theorem stepl, there is a constant C such that if > CJ then by running Spectral 

Partition on the Red graph, we obtain, with probability 1 — o(l), sets U[,U' k , where 


t- 


2 np +1 


|?7/\Z7i| < 0.1n/2k. 

In the rest, we condition on this event. The probability we will talk about in this section is 
based on the edges that go between vertices in Z. 

Now we use the edges that go between vertices in Z. Consider e = (u,v). If u E Ui,v E Uj 
with i j, then e is a Red edge with probability 

p := b/2n. 

Similarly, if u, v E Ui, then e is a Red edge with probability 

r := a/2 n. 

For any u E U\ , the number of its neighbors in f/' is at most 

.9n/2k .ln/2k 

Su(u):= V c" • V C “ 

*=i j =i 

Similarly, for any u E U\ . the number of its neighbors in U\ is at least 

,9n/2k ,ln/2k 

Sn(«) := E <?+ E «“• 

i =1 3 = 1 

After the correction sub-routine, if a vertex u E if is mislabeled then one of the following 
holds 

* S[j > for some j f 1 
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• Su < 


CL~\~b 
~ 8 ~* 


By an application of Chernoff bound, probability that S\ i < can be bounded by p\ = 

exp (—0.04 ) • Similarly, for any fixed j ^ 1, S[j > is bounded by p\. Therefore, 

the probability that any of these happens is bounded by kp\. Therefore, number of vertices in U\ 
that will be misclassified after the correction step is at most 

n/2k 

M := ^ T k 

k= 1 

where Tfc are iid indicator random variables with mean p\. 

EM < -^-fcexp(-0.04 ^~ b ][ ). 

~ 2k k(a + b) J 

Applying Chernoff’s with t := ^ exp(—0.04), we conclude that with probability 1 — 

o(i) 2 

M < EM + t = nexpf—0.04-j-^— 

k[a + b) 

This implies that with probability 1 — o(l), number of mislabeled vertices in U\ is 


< nexp(—0.04 


(a-b) 2 

k(a + b) 


Set 


7 := 2fcexp(-0.04 ^ ). 

k(a + b) 


Therefore, by a union bound over all i, we have that with probability 1—o(l) the output f7{, 17^, U' k 
after the correction step form a 7 -correct partition, with 7 satisfying 

(a -b) 2 1 2k 2k 

w . i\ = ttxt log — = 25 log —, 
k{a + b) 0.04 7 7 


proving our claim. 


B.3 


Proof of lemma 


4.3 


In this section, we show how we can merge Vi n Y with V, fl Z based on the Blue edges that go in 
between vertices in Y and Z. We can assume that that we are given a 7 correct partition U\.... 67 
of U\,.... U k . Now we label the vertices in Y according to their degrees to U' as given in the 
Merge routine. Let us assume 7 < 0.1. In the rest, we condition on this event, and the event that 
the maximum Red degree of a vertex is at most log 2 n, which occurs with probability 1 — o(l). 

Now we use the Blue edges. Consider e = ( u , v). If e is not a red edge, and u G Vi fl Y, v G 
Vj Cl Z, then e is a Blue edge with probability 


p : = 


b/2n 



Similarly, if e is not a Red edge, and u E Vi n Z, v € Vi n Z, then e is a Blue edge with 
probability 
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r := 


a/2n 

a ' 
2 n 


1 - 


Thus, for any u G E n Vi, the number of Blue neighbors in U\ is at most 


.9n/2k .ln/2k 

s r = E «?+ E Q 

i =1 j =1 

where Q are iid indicator variables with mean p and C" are iid indicator variables with mean r. 
Similarly, for any u € E n Vj, the number of Blue neighbors in U[ is at least 


.9n/2k—d(u) .ln/2k 

s < ■= E c + E 

i=l j=l 

After the correction sub-routine, if a vertex u in VTUy is misclassified then one of the following 
holds 


. c. > r A±b 

— 8 k 

. qi <- a+fe 

— 8 k ' 

Let /? be the probability that at least one of the above events happens. Then the number of 
mislabeled vertices in the Y-i is at most 

nil 

M:=J2^k 

k= 1 

where T/,. arc iid indicator random variables with mean p. First we use Chernoff bound to estimate 
p. Consider 


By definition, we have 


Pi ■= 




ES(u) = 


0.9 np,/k + 0.1 nr/k 


.b/Akn. „ ,a/Akn 
= 0.9n(^— r ) + 0.1 n( 7 


1 _ JL ■ 
1 2 n 


1 _ °L- ' 

1 2n 


b a b . 1 . a . 1 

= 0-97T + O-lTi: +0'97r(^7ST - 1) + O-Itt*-—th- - 1). 


Set 


we have 


4 k "4 k Ak 1 —6/2n 


t := 

8 A: 


"4/cl — a/2n 


(2.4) 


a — b b . 1 , a, 1 . a — b b b a a a — b 

t = 0.1—-0.9—(-—-1 -0.1 —(- 7 -1) > 0.1—-0.9—-0.1 —- > 0.09——, 

k Ak y l — b/2n Ak y l — a/2n k Akn Akn k 
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for any sufficiently large n. 

Applying Chemoff’s bound, we obtain 


Pi < exp( 
< exp( 


(0.09(a — b)) 2 

k(0.9b/2 + .la/2) + 0.09fc(a - b ) 
0.0324(a - b) 2 
k(a + b) 


By a similar argument, we get the same bound for 



a + b\ 
8 k ) 


Therefore, by a union bound, we have that 


p < fcexp(— 


0.0324(a - b ) 
k(a + b) 


2 

-)■ 


Thus, we can conclude that 


EM < —k exp(—0.0324-^— 

~ 2 k(a + b)' 


Applying Chernoff’s with t := |A:exp ^—0.0324 


( a ~ b ) 2 

k(a-\-b) 


l-o(l) 


M < EM + t = nfcexp ( —0.0324 


, we conclude that with probability 

(a - b ) 2 ' 


k(a + b) J 

This implies that with probability 1 — o(l), the number of mislabeled vertices in Y is bounded 


by 


nkex p ( —0.0324 


Set 


7 := 2fcexp —0.0324 


(a ~ ^ 

k(a + b)) 
(a - b) 2 


k{a + b ) 

We have, with probability 1 — o(l), 7 correct partition of the vertices in Y, with 7 satisfying 

(a — b) 2 1 2k 2k 

77—TT7 = 7777777 lo § — < 31 log —, 
k(a + b) 0.0324 7 7 


proving our claim. 

C Censor Block Model 

All we have to do now is to bound ||£7||. Let a 2 := p > Var (Q,j) l° r a ll (7, j). Yq is obtained by 
zeroing out rows and columns of Y of high degree. We then have the following lemma. The proof 
is essentially the same as corollary |3.4| so we skip the details. 

Lemma C.l 0 < eo < e < Then there exist constants C. C\ such that if p > —, then with 
probability 1 — o(l), ||Yq — Y"|| < C\a^/n = C\^fnp. 
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Since the second eigenvalue of Y is p(l — 2 e)n, to make the angle between the eigenspace 
spanned by the two eigenvectors corresponding to the top two eigenvalues small, we need to as¬ 
sume 

p( 1 — 2 e)n 


is sufficiently large. The assumption 


np > 


y/np 

C 2 


(1 - 2 e )2 


in theorem 1.6 is precisely this. 

D Proof of Lemma 


A.2 


This proof is essentially same as that in I FOQ5 1. Let us first define the following sets. For 7 /. := 2 , 


a 1 ■ 7fc-i 
S k := < 1 : 


<Xi< ,s k := \S k \,k = [log — J, 0,1,2,..., [log \/n] 

yjn J n 


and 


Vd. 


T k \=[i\ < yi < -j= \ ,t k := \T k \,k = [log —J,0,1, 2,..., [log^/n]. 

[ V' n y n J n 


Further, we use the notation mj := Sitjd and A ij := e(Si, T ; )//i i . ? . We then have 


^ [ Xi-Ai ji/j ^ [ s{tj Aj 


d . H 7 j 


H 


n \ n \ n 


Vd y Si l 

zJ n J 71 


n n 

= Vd y ai(3jCTi t j. 

In the last line, we have used the following notation aj := s*— 6 ■= tj~ <] , <r,, := In 

Tl J J 71 / 


this notation, we can write ?? as follows: 


&i,j Oii log A ij 5 C 3 ■ 


7 i 


7 j 


Vd 


2 log 7j + log 1 

Pj J 


(4.5) 


Now we bound a iPj a i,j b y a constant. We note that a ?: < 4 and A < 4. 


We now consider 6 cases. 
1 . a id < 1 : 


Vd y aiiPjGij < y ai/3j 

< Vd(yai)(ypi 


< 1 6Vd. 
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2. A ij < C2 : Since 7 , 7 ^ > \fd we have in this case cr,;j < C 2 . Therefore, 

\/d 22 ctiPjCTij < 22 a iPj c 2 

2 2 
< 1602 "^. 


3. 7 j > v/c^Tj : Since the maximum degree is < c\d, we have that A ij < c\n/tj. Therefore, 


y/d 22 aiPjdij = Vd-22 


as 


E ft 


A i,j y/~d 


j-lilj>Vd 


lilj 


< Vd22 [ a i 22 ^ 


j-lilj>Vd 


Ij ( c\n/bj)y/d 
n 7i7j 


= Vd22(®i 22 

< y/d22i a i c 1 x 2) 

2 

< 2c\\fd22 Oii 

2 

< 8ciVd. 


4. We now assume that we are not in cases 1 — 3. Therefore, we can assume that ?? holds. We 
consider the following sub cases. 

(a) log X tJ > (1/4)[2log 7 j + log(l //3j)\ : ?? implies that cr ijan < 4c^i/'y j y/d). 

Therefore, 


Vd 22 a i0j a i,j = y/dYj j 22 ai<Ji 4 


Ifi 


<ft5Eft E 




< Vd 22 Pj x 8 c 3 

i 

< 82c^y/d. 


Above we made use of the fact that we are not in case 3, and that W r , 4c-? 7 V 

^‘v.di r h>yd ^'YjVd 

is a geometric sum. 

(b) 2 log 7 j > log(l /Pj) : We can assume we are not in case (a), and hence A,< 7 j- 
Combined with the fact that we are not in case 1, we have that j t < \fd. Since we 
arc not in case 2 , we can assume that log A,;j > 1 and hence aija, < cs^/^dloggj. 
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Therefore, 


Vd y a iPo a i,j = '/d'52\Pj Z 


OLi 




vnajd. 


>Vd 


2^i,jVd 

lilj 


<V5EU Z 4cs 

i \ i:'H')j>Vd 

< ( Z ^' 4c3 Z 

\ j i:~/i'Yj>Vd 

< VdYto** x ^ 

3 


7i 

\/d7j 

7i 

Vd 


l°g 7i 


< 32c3 Vd. 


(c) 2 log 7 j < log 1 //3j : Since we are not in (a) we have log A ij < log It follows that 

A ijVd ^ 1 Vd 


< 


Hlj Pj lilj 


Therefore: 


Vd Y a iPj a i,j = VdY\ a i Y Pj ai d 




>Vd 




>Vd 


Vd y [ a i Y # 




>Vd 


A i,jV~d 

Hlj 
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<VdJ2 Ui 2^ 


J : 7»7j>V / d 

< (a* x 2) 


7i7j 



